ODC  file  copy  ^0848 


Systems 
Optimization 
| _ aboratory 


11^70.  f/n 


THE  IMPLEMENTATION  OF  A  LAGRANGIAN-BASED  ALGORITHM 
FOR  SPARSE  NONLINEAR  CONSTRAINTS' 

by 

B.A.  Murtagh^  and  M.A.  Saunders 

TECHNICAL  REPORT  SOL  80-1 
January  1980 


DTIC 

ELECTED 

MAY  2  8  1980  ' 


Department  of  Operations  Research 
Stanford  University 
Stanford,  CA  94305 


DOTUBUTIOW  STATEMENT  A  T 

Approved  tot  public  release; 
Distribution  Unlimited 


jf 


80  5  27*288 


‘■S*fcShkk.. 


SYSTEMS  OPTIMIZATION  LABORATORY 
DEPARTMENT  OF  OPERATIONS  RESEARCH 
Stanford  University 
Stanford,  California 
94305 


THE  IMPLEMENTATION  OF  A  LAGRANGIAN-BASED  ALGORITHM 
FOR  SPARSE  NONLINEAR  CONSTRAINTS’1" 


T 

B.A.  Murtagh  and  M.A.  Saunders 


TECHNICAL  REPORT  SOL  80-1 
January  1980 


TOE  r*  ,V,  OP1NTCNS  AND  -- n'V'"  CONTA'NED  IV  THIS  REPORT 

are  tmose  c-c  mr  ?.  am-  r j  -r-  s*  construed  as 

OFFICIAL  CEPr  ":  T  !•  TH_  -\f.KU  PGS’ltuN.  POLICY,  OR  DE¬ 

CISION,  UNLESS  GO  DESIGNATED  DY  OTHER  DOCUMENTATION. 


tPresented  at  the  Tenth  International  Symposium  on  Mathematical 
Programming,  Montreal,  August  27-31,  1979, 

tThe  University  of  New  South  Wales,  Sydney,  Australia. 

Research  and  reproduction  of  this  report  were  Dartially  supported 

*Eue!y^0ntraCt  DE-AS03-76^F00326,  PA  Number 
8;  the  Natlonal  Science  Foundation  Grants 
MCS76-2001 9  A01  and  ENG77-06761;  the  U.S.  Army  Research  Office 

and  the  APP^ed  Mathematics  Oivtsion, 

OS  nt.^New  TeaTa  nd . 

Reproduction  in  whole  or  in  part  is  permitted  for  any  purpose  of 
the  United  States  Government.  This  document  has  been  approved 
for  public  release  and  sale;  its  distribution  is  unlimited. 


CONTENTS 


1 .  INTRODUCTION  .  1 

1.1.  Subproblems  .  2 

2.  BRIEF  DESCRIPTION  OF  MINOS  .  3 

2.1.  Some  Aspects  of  the  Algorithm  Used  in  MINOS  .  5 

2.2.  Key  Points  . 7 

3.  EXTENSION  TO  NONLINEAR  CONSTRAINTS  .  8 

3.1.  Statement  of  the  Problem  .  8 

3.2.  The  Linearized  Subproblem  .  10 

3.3.  Choice  of  .  11 

3.4.  Choice  of  p  .  13 

3.5.  Summary  of  Procedure  .  18 

4.  COMPUTER  IMPLEMENTATION  .  19 

4.1.  Sparse  Matrices  .  19 

4.2.  Infeasible  Subproblems  .  19 

4.3.  User  Options  .  21 

4.4.  Subroutines  CALCFG  and  CALCON  .  21 

4.5.  Jacobian  Option  .  22 

4.6.  Partial  Completion  .  23 

4.7.  Lagrangian  Option,  Penalty  Parameter  .  24 

4.8.  Convergence  Conditions  .  25 

5.  TEST  PROBLEMS  .  2  7 

5.1.  Colville  No.  2  27 

5.2.  Colville  No.  3  27 

5.3.  Colville  No.  8  28 

5.4.  Powell's  Problem  .  28 

5.5.  Power  Scheduling  .  29 

5.6.  Launch  Vehicle  Design  .  29 

5.7.  Quartic  with  Quartic  Constraints  . 29 

5.8.  Dembo  No.  7  . 30 

5.9.  Wright  No.  4  30 

5.10.  Wright  No.  9  31 

5.11.  Optimal  Control  .  32 

5.12.  Economic  Growth  Model  .  33 


_1 


™*T" 


CONTENTS  Cone. 


6.  RESULTS  AND  DISCUSSION  .  35 

6.1.  Problems  5. 1-5. 8  35 

6.2.  Problems  5.9-5.12  36 

6.3.  Problem  5.10  (Wright  No.  9)  39 

6.4.  Problem  5.11  (Optimal  Control)  .  40 

6.5.  Problem  5.12  (Economic  Growth)  .  41 

7.  CONCLUSIONS  .  43 

ACKNOWLEDGMENTS  .  45 

REFERENCES  .  46 


ABSTRACT 


An  algorithm  is  described  for  solving  large-scale  nonlinear 
programs  whose  objective  and  constraint  functions  are  smooth  and  con¬ 
tinuously  differentiable.  The  algorithm  is  of  the  augmented  Lagrangian 
type,  involving  a  sequence  of  sparse,  linearly  constrained  subproblems 
whose  objective  functions  include  a  modified  Lagrangian  term  and  a  modified 
penalty  function. 

The  algorithm  has  been  implemented  in  a  general  purpose  Fortran 
code  called  MINOS /AUGMENTED.  The  system  is  intended  for  use  on  problems 
whose  Jacobian  matrix  is  sparse.  (Such  problems  usually  include  a  large 
set  of  purely  linear  constraints.)  The  bulk  of  the  data  may  be  assembled 
using  a  standard  linear-programming  matrix  generator.  Function  and  gradient 
values  for  all  nonlinear  terms  are  supplied  by  two  user-written  subroutines. 

Some  aspects  of  the  implementation  are  described  in  detail,  and 
computational  results  are  given  for  some  nontrivial  test  problems. 

Assuming  convergence  occurs,  the  work  involved  is  comparable  to  the 
solution  of  a  moderate  number  of  linear  programs  of  similar  size. 


A 


! 

! 


1.  INTRODUCTION 


The  work  reported  here  was  prompted  by  consideration  of  various  ways 
to  extend  the  linearly  constrained  optimization  code  MINOS  (Murtagh  and 
Saunders  [13])  to  include  the  capability  of  solving  nonlinearly  constrained 
problems.  In  particular  we  are  concerned  with  large,  sparse  problems,  in  the 
sense  that  each  variable  is  associated  with  relatively  few  constraints. 

Ignoring  sparsity  for  the  moment,  consider  the  model  problem 


minimize  f  (x) 

subject  to  £  (x)  =  0,  u 


(1.1) 


where  the  functions  of  x  are  assumed  to  be  twice  differentiable  with 
bounded  Hessians.  For  this  problem  the  algorithm  discussed  here  would  solve 
a  sequence  of  linearly  constrained  subproblems  of  the  form 


min  LCx.x^jX^.p)  =  f^(x)  ~  X^(£  ~  I)  +  yP  (X  ~  £)T( L  ~  £)  (1.2a) 

s.t.  jf  =  0,  _<  x  _<  u  (1.2b) 


where  Jf  is  a  linear  approximation  to  f^x)  at  sotne  point  x^*  (Thus 
_f  =  +  J^(x  -  x^)  where  f^  and  are  the  constraint  vector  and 

Jacobian  matrix  evaluated  at  x^.)  With  p  =  0,  subproblem  (1.2)  corresponds 
to  that  used  by  Robinson  [20]  .  The  same  subproblem  (with  p  ■  0)  is  used 
in  Phase  2  of  Rosen's  algorithm  [21], 

The  expression  (1.2a)  will  be  called  a  modified  augmented  Lagrangian. 
When  x^  and  are  taken  to  be  the  solution  and  corresponding  Lagrange 

multipliers  for  the  previous  subproblem,  Robinson  has  shown  for  the  case 
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0  =  0  that  the  sequence  will  converge  quadra t ical ly  to  a  solu¬ 

tion  of  the  original  problem  (1.1)  as  long  as  the  initial  pair  (>Cq,_\^) 
sufficiently  close  to  that  solution.  A  case  for  which  convergence  can  be 
expected  is  when  the  modified  Lagrangian  I.U  ,  x^ ,  0)  is  convex.  Since 

this  is  not  always  true,  the  penalty  term  involving  p  is  included  here 
to  ensure  that  the  Hessian  of  L(x,  x^,  p)  is  positive  definite  within 

an  appropriate  subspace.  It  also  Inhibits  large  discrepancies  between  f_ 
and  f,  thereby  discouraging  large  changes  in  x  in  each  subproblem  if 
the  nonlinearities  are  such  that  the  linearized  constraints  have  little 
meaning  far  from  the  point  of  linearization.  As  always,  the  intention  is 
to  allow  convergence  from  a  wider  range  of  starting  points.  Use  of  (1.2) 

represents  an  alternative  to  Phase  1  of  Rosen's  algorithm  [21]  in  which 

0  IT 

(1.2a)  is  replaced  by  f  (x)  +  —  p_f  f_  and  the  linearized  constraints 
_f  =  C)  are  deleted  from  (1.2b). 

The  reason  for  choosing  the  modified  penalty  in  (1.2a)  rather  than 
1  T 

the  conventional  ^  p—  —  will  become  clear  when  sparsity  is  reintroduced. 

1.1.  Subproblems 

It  has  been  argued  in  the  past  that  the  need  to  solve  linearly 
constrained  subproblems  is  a  drawback  of  methods  such  as  Robinson's.  How¬ 
ever  when  projection  (or  reduced-gradient  or  variable-reduction)  methods 
are  used  we  would  take  the  view  that  linearly  constrained  subproblems  are 
actually  easier  to  solve  than  the  unconstrained  subproblems  encountered 
in  other  Lagrangian-  and  penalty-based  methods.  (Certainly  the  implemen¬ 
tation  is  more  complex  but  with  linear  constraints  present  the  optimization 
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usually  takes  place  In  a  subspace  of  much  smaller  dimension.) 


For  a  certain  class  of  objective  functions,  the  development  of  MINOS 
has  opened  the  way  to  solving  large  linearly  constrained  problems  quite 
efficiently.  Hence  for  large  versions  of  problem  (1.1)  involving  a  sparse 
Jacobian  matrix  and  many  purely  linear  constraints,  it  is  natural  to  apply 
MINOS  to  the  corresponding  subproblems  (1.2).  The  resulting  extension  of 
MINOS  is  called  MINOS /AUGMENTED  and  is  documented  in  [14).  Our  aim  is  to 
describe  the  algorithm  used  and  some  details  of  its  practical  implementation, 
and  to  discuss  its  performance  on  some  nontrivial  problems. 

Note  that  the  Lagrangian  and  penalty  terms  in  (1.2a)  require  continual 
evaluation  of  the  nonlinear  constraint  functions  during  the  solution  of  (1.2). 

In  some  cases  this  may  be  expensive.  MINOS /AUGMENTED  therefore  allows  the  option 
of  setting  =  0  and  p  =  0  so  that  only  f^(x)  remains  in  (1.2a). 

Some  results  obtained  using  this  option  are  also  reported. 

The  MINOS  code  for  linearly  constrained  optimization  is  briefly 
described  in  Section  2,  and  Section  3  discusses  the  method  for  handling 
nonlinear  constraints.  Details  of  the  computer  implementation  are  described 
in  Section  4,  and  Sections  5  and  6  present  some  test  problems  and  a  discussion 
of  their  solution. 

standard  form: 

(2.1) 

(2.2) 


2.  BRIEF  DESCRIPTION  OF  MINOS 

MINOS  solves  problems  expressed  in  the  following 


T  T 

minimize  jF(x)  +  c_  x  +  d  % 


subject  to 


['] 


(2.3) 


where  A  is  m  by  n,  m  ^  n,  and  the  variables  are  partitioned  into  "non¬ 
linear"  and  "linear"  variables  x  and  ^  respectively.  (This  standard  form 
is  a  slight  generalization  of  the  one  normally  used  for  linear  programming 
problems;  it  emphasizes  the  fact  that  nonlinearities  in  the  objective  function 
often  involve  just  a  few  of  the  variables  x . ) 

For  numerous  practical  reasons  the  last  m  columns  of  A  form  the 
identity  matrix  I,  and  the  last  m  components  of  y_  are  the  usual  logical 
("slack"  or  "surplus")  variables. 

MINOS  uses  an  "active  constraint"  strategy,  with  the  general  con¬ 
straints  and  some  portion  of  the  bound  constraints  being  active  at  any  given 
time.  Thus  if  «  is  partitioned  as  [B  S  N]  where  N  is  a  set  of  "non- 
basic"  columns,  the  active  constraints  are  always  of  the  form 


The  first  part  of  this  equation  is  equivalent  to 


while  the  second  part  reading  indicates  that  the  nonbasic  vari¬ 

ables  x^  are  being  held  equal  to  one  or  other  of  their  bounds.  (The  com¬ 
ponents  of  b^  come  from  or  ^  as  appropriate  and  the  partition 

[Xg,  Xg,  Xjj]  is  some  permutation  of  [x,jf].) 
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The  remaining  columns  of  A  are  partitioned  into  "basic"  and 

"superbasic"  sets  B  and  S,  such  that  the  basis  matrix  B  is  square  and 

nonsingular.  The  corresponding  basic  and  superbasic  variables  x  and 

“B 

jtg  are  free  to  vary  between  their  bounds  during  the  next  iteration. 

It  can  readily  be  shown  that  an  optimal  solution  of  the  above  form 
exists  for  which  the  number  of  superbasic  variables  is  less  than  or  equal 
to  the  number  of  nonlinear  variables. 


2.1.  Some  Aspects  of  the  Algorithm  Used  In  MINOS 
The  operators 


B  S  N 

-B_1S 

n 

h-i 

_ 1 

,  z  = 

I 

0  - 

(2.5) 


will  be  useful  for  descriptive  purposes.  The  active  constraints  (2.4)  are 
of  the  form  Ax  =  b,  and  Z  happens  to  satisfy  AZ  =  0. 

Under  suitable  conditions  a  feasible  descent  direction  p  may  be 
obtained  from  the  equations 

ZTGZj.s  =  -ZTg  ,  p  =  Zps  (2.6) 


(see  [6]).  In  particular,  if  the  reduced  gradient  Z  £  is  nonzero,  and  if 

T 

the  reduced  Hessian  Z  GZ  is  positive  definite  (or  if  any  positive  definite 

T 

matrix  is  used  in  place  of  Z  GZ) ,  then  the  point  x  +  ap  lies  on  the 


active  constraints  and  some  scalar  a  >  0  exists  for  which  the  objective 
function  has  a  lower  value  than  at  the  point  x. 


Other  matrices  7.  exist  satisfying  A 7.  =  0,  hut  the  form  chosen 

above,  together  with  a  sparse  LI'  factorization  of  B,  allows  efficient 

T  -1 

computation  of  the  products  Z  g  and  Zp^..  (Neither  B  nor  7.  is 

X 

computed.)  A  positive-definite  approximation  to  Z  GZ  is  maintained  in 
T 

the  form  R  R  where  R  is  upper  triangular.  Quasi-Newton  updates  to  R 

lead  to  superl inear  convergence. 

Let  the  gradient  of  the  objective  function  (2.1)  be  the  vector 
T 

%  =  (gB  %  %)  •  If  2  satisfies 

T 

B  r  =  g  (2. 


it  is  easily  seen  that  the  reduced  gradient  is 


(2.8) 


Hence  in  linear  programming  terminology  the  reduced  gradient  is  obtained  by 
"pricing"  the  superbasic  columns  S.  This  is  a  cheap  operation  once 
has  been  computed. 

T  T 

Likewise  for  £  we  have  R  R£<,  =  -Z  g  and  then 


’  £b  " 

■  -»‘1s£s  ■ 

£  = 

"  z*s  " 

Es 

0 

(2.9) 


so  most  of  the  work  lies  in  solving  B_pB  =  -Spg.  (The  value  £^,  =  0  indi¬ 
cates  that  no  change  will  be  made  to  the  current  nonbasic  variables.  As 

T 

long  as  the  reduced  gradient  Z  £  is  nonzero,  only  the  variables  in  [B  S] 


- 
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are  optimized.  If  any  such  variables  encounter  an  upper  or  lower  bound 

they  are  moved  into  N  and  the  partition  [B  S)  is  suitably  redefined.) 

T 

Note  that  if  the  reduced  gradient  does  prove  to  be  zero  (Z  g  =  0) 
the  reduced  objective  has  reached  its  optimal  value.  If  we  compute 
j  =  g  -  (i.e.  the  usual  pricing  of  nonhasic  columns)  we  then  have 


1 

CO 

*—} 

1 

V 

Ib 

_  T 

S 

o 

= 

% 

VT  r 

N  I 

-  -5n- 

so  that  tt  and  o  are  exact  Lagrange  multipliers  for  the  current  active 

constraints.  The  components  of  o  indicate  whether  any  nonbasic  variables 

should  be  released  from  their  bounds.  If  so,  one  or  more  are  moved  from  N 

into  S  and  optimization  continues  for  the  new  set  [B  S].  If  not,  an 

optimum  has  been  obtained  for  the  original  problem. 

In  practice,  optimization  for  each  [B  S]  will  be  curtailed  when 
T 

Z  g  is  sufficiently  small,  rather  than  zero.  In  this  case  tt  will  be 
just  an  approximation  to  the  Lagrange  multipliers  for  the  general  constraints. 
The  accuracy  of  tt  will  depend  on  the  size  of  ||z  g||  and  on  the  condition 
number  of  the  current  basis  B. 


2.2.  Key  Points 

The  algorithm  implemented  in  MINOS  provides  a  natural  extension  of 
linear  programming  technology  to  problems  whose  objective  function  is  non¬ 
linear.  If  the  number  of  nonlinear  variables  is  moderate  (or  more  precisely, 
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if  the  number  of  superbasic  variables  and  hence  the  dimension  of  R  is 


moderate)  then  the  work  per  iteration  is  not  substantially  greater  than  for 
one  iteration  of  the  revised  simplex  method  on  the  same  data 


X 

X 

A 

y 

=  b, 

A  < 

y 

_  _ 

_ 

Here  we  assume  that  the  cost  of  evaluating  the  objective  function 
and  its  gradient  is  moderate  compared  to  manipulation  of  a  sparse  factori¬ 
zation  of  the  basis  matrix  B.  At  the  same  time  it  is  important  that  the 
step-size  a  be  determined  efficiently.  The  line-search  procedure  used 
in  MINOS  is  that  of  Gill,  Murray  et^  a^.  ([7]),  which  allows  the  user  to 
control  the  accuracy  of  the  search  by  means  of  a  parameter  ETA,  where 
0.0  ETA  <  1.0.  Even  with  a  relatively  accurate  search  (e.g.  ETA  =  0.01) 
the  number  of  function  and  gradient  evaluations  required  is  typically  very 
few  (e.g.  1,  2  or  3  per  search).  This  is  increasingly  beneficial  for  the 
algorithm  discussed  next,  where  the  objective  function  is  modified  to 
include  an  arbitrary  number  of  nonlinear  functions. 


3.  EXTENSION  TO  NONLINEAR  CONSTRAINTS 
3.1.  Statement  of  the  Problem 

It  is  assumed  that  the  nonlinearly  constrained  problem  can  be 
expressed  in  the  following  standard  form: 
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minimize 


0  T  T 

f(x)+<Li  +  lX  (3.1) 

subject  to  f^(x)  +  =  4  (m^  rows)  (3.2) 

A2x  +  A^  =  j>2  (m2  rows)  (3.3) 

l<[f  ]<«  m  =  m^  +  (3.4) 


1  IT 

where  _f(x)  =  [f  ( x) ,  ...  ,  f  (x) ]  .  The  first  variables  x  are 

again  called  "nonlinear  variables."  They  occur  nonlinearly  in  either  the 
objective  function  or  the  first  constraints.  There  may  be  purely 

linear  constraints,  given  by  (3.3).  As  before,  a  full  set  of  slack  variables 
is  included  as  the  last  m  components  of  the  "linear  variables"  y,  so 
that  general  equality  and  inequality  constraints  can  be  accommodated  in  (3.2) 
and  (3.3)  by  means  of  suitable  bounds  in  (3.4). 

We  shall  assume  that  the  functions  f*(x)  are  twice  continuously 
differentiable  with  gradients  ^(x)  and  bounded  Hessians  Gi(x),  i  =  0,1,...,  . 

We  shall  also  assume  that  the  1st  and  2nd  order  Kuhn-Tucker  conditions  hold 
for  a  local  minimum  x*  with  corresponding  A*. 

The  solution  process  consists  of  a  sequence  of  "major  iterations," 
each  one  involving  a  linearization  of  the  nonlinear  constraints  at  some 
point  x^»  corresponding  to  a  first-order  Taylor's  series  approximation: 

f1(x)  =  f^x^)  +  l*  “  2^)  +  0||  x  -  XjJI  2 


We  thus  define 


fCx.Xj^)  -  fO^)  +  JO^Hx  -  x^)  , 
or 

I  -  4  +  V-  -  **)  . 


(3.5) 


9 


whore 


.1  (x)  is  the  (m^  x  n  )  Jacobian  matrix  whose  ij-th  element  is 

3  fV'ix..  We  then  see  that 
J 

f  -  f  =  (f  -  fk)  -  J  (x  -  j^)  ( 3 .  h ) 

consists  of  the  higher  order  ("nonlinear”)  terms  in  the  Taylor's  expansion 
of  f  (x)  about  the  point  x^- 


3.2.  The  Linearized  Subproblem 


At 

the  kth  major  iteration  the  following  linearly  constrained  sub- 

problem 

is 

solved : 

minimize  L  (x,y,x,  ,  £ ,  p)  =  f°(x)  +  £Tx  +  dTy  -  _A?(f  -  D  +  T  p^-_  ~  O 

x,y  '  K 

(3.7) 

subject 

to 

1  +  AXZ  =  bx  , 

(3.8) 

A2x  +  A^y  *  b2  , 

(3.9) 

<  u  .  (3.10) 

The  derivative  of  the  objective  function  with  respect  to  x  is 

f“  =  S.(2L»*k*Ak*P)  =  £°(*)  +  £  -  (J-J^1^  -  p(f  -  f.)]  .  (3.11) 

We  see  that  the  nonlinearities  in  L  involve  x  but  not  y.  (In  contrast  the 

normal  penalty  function  would  involve  the  constraint  violation 

Jf  +  A^y  -  in  place  of  _f  -  JE. )  This  represents  a  vital  advantage  of 

the  modified  augmented  Lagrangian  (3.7),  since  it  means  that  each  linearized 

subproblem  has  the  same  number  of  nonlinear  variables  as  the  original 


l  < 


x 

1 
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problem.  The  dimension  of  the  reduced  Hessian  for  the  subproblem  is  there¬ 
fore  bounded  in  the  same  way  as  for  the  original  problem,  i.e.,  by  the 
dimension  of  £  (cf.  Theorem  1  of  [13]). 

The  use  of  a  penalty  term  to  ensure  the  augmented  Lagrangian 
maintains  a  positive-definite  Hessian  in  the  appropriate  subspace  was 
first  suggested  by  Arrow  and  Solow  [2]  and  adopted  later  by,  among  others, 
Hestenes  [10]  and  Powell  [16]  in  their  sequential  unconstrained  procedures, 
and  by  Sargent  and  Murtagh  [23]  in  conjunction  with  their  "variable-metric 
projection"  algorithm  involving  a  sequence  of  linearized  constraints.  The 
modified  penalty  term  has  not  been  used  elsewhere  since  no  distinction  has 
been  made  previously  between  linear  and  nonlinear  variables.  Note  that  the 
modified  penalty  is  identical  to  the  conventional  penalty  in  the  subspace 
defined  by  the  linearized  constraints. 

3.3.  Choice  of  A, 

The  choice  A^  =  0,  p  =  0  corresponds  to  simple  sequential  lineari¬ 
zation  of  the  nonlinear  constraints  with  no  additional  terms  to  f^(x)  in 
the  objective  function.  We  shall  call  this  the  'Newton  strategy,'  although 
it  should  not  be  confused  with  applying  Newton's  method  to  the  Kuhn-Tucker 
equations  for  a  solution  of  (3.1)-(3.4). 

Ideally,  should  be  as  close  as  possible  to  A*,  but  of  course 

the  optimal  multipliers  are  normally  unknown.  The  simplest  choice  is 
A^  *  X,  the  multipliers  corresponding  to  linearized  constraints  at  the 
solution  of  the  previous  subproblem.  As  we  shall  see,  this  choice  is  the 
best  of  several  alternatives.  For  convenience  suppose  there  are  no  linear 

a  m 

constraints,  so  that  A^  ■  £  is  the  solution  of  B  £  “  £5  at  t*ie  end  of 

T 

the  previous  major  iteration.  We  know  that  £  also  satisfies  S  £  ■ 
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(at  least  to  within  the  convergence  tolerance  used  for  the  suhprob  ]  em)  . 
We  thus  have 


T 

B 

\  = 

T 

S 

_  — 1 

h 

and  it  is  immaterial  which  variables  are  in  B  and  which  are  in  S.  N'ow 

g  is  zero  for  all  slack  variables  and  it  follows  immediately  that 

A,  =  0  if  the  ith  linearized  constraint  is  inactive.  The  choice  A,  =  ' 
i 

therefore  ensures  that  an  apparently  inactive  nonlinear  constraint  will 

T 

be  excluded  from  the  Lagrangian  term  ^(f  ~  f)  in  the  next  subproblem. 
This  is  a  desirable  property. 

It  may  seem  that  a  better  approximation  to  _X*  could  be  obtained 
by  evaluating  the  new  Jacobian  J(x)  which  is  required  anyway  for  the 
next  subproblem.  Let  the  resulting  "new"  [B  S]  be  denoted  by  [B  S] . 
One  possibility  is  to  define  A^  as  the  solution  of  the  least-squares 
problem 


'  iT  " 

\  ^ 

®B 

.  sT_ 

_  'NJ 

-  is  - 

where  the  rhs  is  still  the  "old"  gradient  vector  for  the  previous  augmented 
Lagrangian.  However,  this  least-squares  problem  would  be  very  expensive 
to  solve  for  large  problems.  Furthermore  it  is  not  guaranteed  that  A^  =  0 
would  result  where  desired. 

-T- 

A  cheaper  alternative  would  be  to  solve  B  jt  =  £  and  take 
^  *  ir,  but  then  A^  =  0  for  inactive  constraints  would  be  assured  only 
if  the  corresponding  slack  variable  happened  to  be  basic  and  not  superbasic. 

If  the  new  B  is  to  be  used,  the  method  of  Sargent  and  Murtagh  [23] 

shows  that 
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i>TA  «  &B  +  [I  0  0]Gl  Ax 

would  produce  the  correct  multipliers  for  the  solution  of  the  new  subproblem 
if  the  original  objective  and  constraints  were  quadratic  and  was  an 

adequate  approximate  to  the  Hessian  of  the  new  Lagrangian.  (See  equation 
(12)  in  [13].)  However,  this  again  is  not  a  practical  alternative  for 
large  problems. 

3.4.  Choice  of  p 

It  is  well  known  that  21*  need  not  be  a  local  minimum  of  the 
Lagrangian  function  (with  p  =  0) -  If  we  assume  that  Jfa*)  is  of  full 
rank,  then  _A*  exists  and  is  such  that 

L  (*>_*)  =  +  £T21  +  ~  XV  +  -  bj] 

is  stationary  at  (ic*,^.*),  but  L(x*,_^*)  may  well  display  negative 
curvature  in  21  at  21*  • 

The  most  that  can  be  said  [26]  is  that,  if  we  consider  the  con¬ 
straints  satisfied  at  x*  as  equalities  and  ignore  the  inactive  ones,  then 
a  necessary  (sufficient)  condition  that  x*  is  a  local  minimum  is 

Z(x*)T  (x*,A*)  “  0 

and 

Z(X*)1  (JC*,A*)Z(2C*> 

3  21 

is  positive  semidefinlte  (positive  definite),  where  Z(x*)  is  as  defined 
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in  equation  (2.5)  using  J(x*)  in  the  appropriate  part  of  A. 


Thus  if  we  restrict  our  search  to  he  linearly  constrained  sub¬ 
space  defined  by  we  do  indeed  seek  a  minimum  of  the  Lagrangian,  and 

we  may  expect  that  when  x^  is  sufficiently  close  to  x*  for  J(x^) 
to  be  close  to  J(x*)  we  may  minimize  (3.7)  with  p  =  0.  This  is  confirmed 
by  Robinson's  theorem  on  quadratic  convergence  [20]. 

Difficulty  arises  when  x^  is  far  removed  from  x*,  since  the 
linearized  constraints  may  define  a  subspace  where  perhaps  a  saddle-point 


would  be  closer  to  x*  than  a  minimum  would  be.  Successive  minima  of 


(3.7)-(3.10)  with  p  =  0  may  therefore  fail  to  converge  to  x* .  The 

-  x 

addition  of  a  penalty  term  p[^-f]  [_f-f]  imposes  the  correct  curvature 
properties  on  (3.7)  for  a  sufficiently  large  p  >  0. 


For  general  nonconvex  problems  it  is  not  practical  to  determine 
a  priori  what  the  appropriate  order  of  magnitude  p  should  be  (indeed 
p  =  0  is  often  adequate  even  in  the  nonconvex  case) .  The  more  important 
consideration  is  when  to  reduce  p  to  zero,  for  we  know  that  there  is  a 
radius  of  convergence  around  (x*,_X*)  within  which  Robinson's  theorem 


holds  for  p  =  0,  and  we  can  then  expect  a  quadratic  rate  of  convergence. 

Two  parameters  we  can  monitor  at  the  solution  x  to  each  linearized 
subproblem  are  the  constraint  violation  or  'row  error'. 


llf(x)  +  Ajjr  -  b-jJI  =  II  _f  (x)  -  ^(x,}^)  ||  , 

and  the  change  in  multiplier  estimates,  ||  -  _\^||  .  The  question  that 

arises  is  whether  these  can  be  used  to  provide  adequate  measures  of  con¬ 
vergence  toward  x*. 

For  simplicity,  consider  the  equality-constrained  problem 


where  the  functions  of  x  are  twice  continually  differentiable  with 
bounded  Hessians.  We  shall  assume  that  at  some  point  x*  the  Jacobian 
J(x*)  is  of  full  rank,  there  exists  a  A*  such  that  3f°/3x  =  J(x*)TA* 
and  the  reduced  Hessian  Z(x*)T  32L(x*, A*)/3x2  Z(x*)  is  positive 
definite  (i.e  the  sufficiency  conditions  are  satisfied  for  x*  to  be  a 
local  optimum) . 


Theorem  1. 

Let  (xR,  A^)  be  an  approximate  solution  to  PQ  and  let 
(x,A)  be  a  solution  to  the  linearized  subproblem 

S-j^ :  minimize  f°(x)  -  A^(f  -  f)  +  p  ( f  -  f)T(f  -  f) 

subject  to  f(x,x,  )  =  0 
- He  — 

^  —  ~  =  £.1  an<^  f.(x)  =  £_2>  then  (x,£)  is  also  a  solution  to  the 

perturbed  problem 

P!:  minimize  f°(x)  +  +  P£2)T  ( f  -  f) 

subject  to  f_(x)  -  £2 
for  sufficiently  small  and 


Proof.  If  (x.^)  is  a  solution  of  S  we  must  have  {_  =  0  and 

£°(x)  -  (J  -  Jk)T\  +  p(J  ~  Jk)  (1  ~  V  = 

where  is  the  Jacobian  at  x^  but  d «  A  and  A  are  evaluated  at  x. 

Adding  (J  -  J  to  both  sldes  and  inserting  the  expressions  for  ^  and 

£2  gives 

s°(x)  +  (J  -  Jk)T£1  +  p(j  "  Jk)T£2  =  jT- 

which  shows  that  (x,.X)  also  satisfies  the  conditions  for  a  stationary 
point  of  P^.  Now  it  can  be  shown  that  the  Hessians  for  the  Lagrangian 

*  T  - 

functions  of  S1  and  PL  differ  only  by  the  amount  p(J  -  Jk)  (J  -  Jfc) 

2 

at  the  solution  of  P.^  which  is  of  order  p  i|Axjlk  where  =  x  - 

Hence  for  sufficiently  small  and  '^xk»  H  tbe  reduced  Hessian  of 

S  is  positive  definite  at  x  then  by  continuity  the  reduced  Hessian  of 
P1  will  also  be  positive  definite,  thus  satisfying  the  sufficiency  con¬ 
ditions  for  a  local  minimum  of  P^  at  x.  D 

It  is  of  interest  to  examine  the  corresponding  result  for  the 
conventional  penalty  term. 

Theorem  2.  Let  (x^A^)  be  an  approximate  solution  to  P^  and  let 
(x,A)  be  a  solution  to  the  linearized  subproblem 


S2:  minimize  f^(x)  -  A^(f  ~  f)  +  y  pf^A 

subject  to  *  £. 

If  A  “  Ajj  s  and  *  ^2’  t^ien  (x«A)  is  als°  a  solution  to  the 

perturbed  problem 

?2:  minimize  f°(x)  +  £^(A  ~  D  +  P£^— 

subject  to  _f (x)  **  e~  • 

Proof.  Analogous  to  the  proof  of  Theorem  1.  □ 

Again  it  follows  that  if  and  are  sufficiently  small, 

(x,A)  will  be  within  the  radius  of  convergence  of  Robinson's  theorem  and  p 
can  safely  be  reduced  to  zero.  A  point  of  interest  is  that  problem 
appears  to  be  less  sensitive  than  P2  to  deviations  from  its  optimum. 

Thus,  let  Ax  be  an  arbitrary  small  change  to  the  solution  x  of  P^. 

The  objective  function  for  P^  then  differs  from  the  true  objective 
fO(x)  by  an  amount 

61  ”  (£1  +  P£2)T(-  “  * 

IfiJ  <  (II  ^11  +  P|l  e2ll  )  0||  Ax||  2  . 

For  P2  the  analogous  deviation  is 
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T  -  T 

2  =  l}(£  -  D  +  *  4^ 


=  ~  +  PI2  '  -  +  J'-x  +  Oli  Axil  “)  . 


i  62 1  <  (||  £lll  +  pH  c2II  )  Olhxll  2  +  oil  t,ll  2  +  r  II  JT£  II  II  Axil  . 


Since  6^  is  of  order  ||  Axil  while  62  is  of  order  II  Axil  ,  it 
appears  that  the  modified  penalty  term  in  S^  has  a  theoretical  advantage 
over  the  conventional  penalty  term  of  S^. 


3.5.  Summary  of  Procedure 

The  cycle  of  major  iterations  can  be  described  as  follows: 

Step  0.  Set  k  =  0.  Choose  some  initial  estimates  Xq>  Aq  an^  specify 
a  penalty  parameter  p  ^  0  and  a  convergence  tolerance  r  >  0. 


Step  1.  (a)  Given  x^,  and  p,  solve  the  linearly  constrained  problem 

(3.7)-(3.10)  to  obtain  new  quantities  x^-n*  ^k+1  and  £(*^+3) ■ 

(b)  Solve  BTjt  =  £g(xk+1)  . 

(c)  Set  =  the  first  components  of  it. 

Step  2.  (a)  Test  x^+i  for  convergence.  If  optimal,  exit. 

(b)  If  ,|i(W+AA+i"^i!l/(1+ll(^+i’2k+i)l,)  1  £c  and 

II  ]^+1  -  A^H  /(1  +  II  \+1H  >  iEc>  then  set  P  =  °- 

(c)  Relinearize  the  constraints  at  • 

(d)  Set  k  =  k+1  and  repeat  from  Step  1. 

This  procedure  would  not  be  complete  without  an  algorithm  for  increasing  the 
penalty  parameter  in  certain  circumstances.  In  Step  2(b)  of  the  present  imple¬ 
mentation,  we  raise  p  by  some  factor  if  the  relative  change  in  proves 

to  be  very  large. 


i 

i 


4.  COMPUTER  IMPLEMENTATION 
4.1.  Sparse  Matrices 

Using  equation  (3.5),  the  linearized  constraints  (3.8)  can  be 
expressed  in  the  form: 


Jk-  +  Ai*  =  ^  +  JiA  “4  <*.n 

where  =  f (x^  .  The  terms  on  the  right-hand  side  of  (4.1)  are  constant  and 
become  part  of  "b",  the  current  right-hand  side.  The  set  of  linear  constraints 
"Ax  =  b”  for  each  major  iteration  is  thus  of  the  form: 


J,  A,' 

X 

b,  +  J  x,  -  f. 

k  1 

-1  kr-k  — k 

-  A2  A3- 

_  1  „ 

—2 

(4.2) 


The  major  implication  of  A  being  large  and  sparse  is  that  efficient 
methods  are  available  for  forming  and  updating  an  LU  factorization  of  the 
basis  matrix  B  (cf.  equation  (2.4)).  (In  particular,  a  "bump  and  spike" 
algorithm  [9]  is  used  to  preserve  sparsity  at  each  refactorization  of  B. 

This  occurs  at  the  start  of  every  relinearization  and  occasionally  thereafter 
as  necessary.  At  each  intervening  change  of  basis  the  LU  factors  are  updated 
using  the  scheme  described  by  Saunders  [24]  to  preserve  both  sparseness  and 
numerical  stability.) 


4.2.  Infeasible  Subproblems 

One  of  the  difficulties  with  sequential  linearization  is  that  some 
of  the  linearized  subproblems  may  prove  to  be  infeasible.  In  particular, 
the  point  used  to  define  subproblem  k  is  usually  not  a  feasible 
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point  for  the  subprobl  t*r. .  However  it  will  tvpioaiiv  be  -  e.is :  :•  ’  e  for  : 
previous  subproblem  land  possibly  optimal).  This  tan  bo  used  to  aov  ru.ua 
in  the  manner  suggested  bv  Powell  [lb).  Thus  w  write  the  i i near i zed  .  o:.- 
straints  (4.1)  in  the  form 

V  *  -V  -  h  +  ’A  -  h  *  '1  0.3. 

where  ,£  is  a  perturbation  to  the  right-hand  side.  If  ( ,  v  )  is  tin 
final  feasible  point  from  subproblera  k  -  1,  we  can  show  that  it  will  also 
be  feasible  with  respect  to  the  new  linearized  constraints  (4.4)  if  .  =  1 
and  £  =  _f(x  )  -  f^(x  ,x,  ).  (Thus  £  is  the  value  of  f  -  f  at  the  end 

K  K  it-l 

of  the  previous  major  iteration.) 

In  MINOS/AUGMENTED  the  right-hand  side  of  (4.3)  is  initialized  with 
y  =  0.  If  the  subproblem  proves  to  be  infeasible  we  add  —  £  to  the  right- 
hand  side  and  continue  the  solution  process.  If  there  is  still  no  feasible 
solution  we  add  ^  ^  £  and  so  on.  This  simulates  the  sequence  of  values 

Y  =  ...  tending  to  1  as  desired. 

If  the  above  procedure  fails  after  10  modifications,  or  if  it  is 
not  applicable  (e.g.  when  k  =  0  or  the  previous  subproblem  was  infeasible), 
a  new  linearization  is  requested  as  long  as  at  least  one  minor  iteration 
has  been  performed.  Otherwise  the  algorithm  is  terminated  with  the  assump¬ 
tion  that  the  original  problem  itself  is  infeasible. 

In  [21],  Rosen  guards  against  infeasible  subproblems  by  linearizing 
perhaps  only  some  of  the  nonlinear  constraints,  namely  those  that  have  been 
active  or  reasonably  close  to  active  at  any  earlier  stage.  This  alternative 
could  be  implemented  in  MINOS /AUGMENTED  by  adjusting  the  bounds  on  the  slack 
variables  associated  with  the  linearized  constraints. 
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4.3.  Use r  Options 

Various  implementation  options  are  discussed  in  the  following 
sections.  Capitalized  keywords  at  the  head  of  each  section  illustrate 
the  Input  data  needed  to  select  any  particular  option.  Fuller  details 
are  given  in  the  user's  manual  [14]. 

4.4.  Subroutines  CALCFG  and  CALCON 

VERIFY  OBJECTIVE  GRADIENT 

VERIFY  CONSTRAINT  GRADIENTS 

As  In  the  linearly  constrained  version  of  MINOS,  a  user-written 
subroutine  CALCFG  is  required  to  calculate  the  objective  function  f^*(x) 
and  its  gradient.  The  Lagrangian  terms  in  (3.7)  are  calculated  internally. 

The  user  also  supplies  a  subroutine  CALCON  to  define  the  constraint 
vector  f_(x)  and  the  current  Jacobian  J(x).  The  nonzeros  in  J  are 
returned  column-wise  in  an  output  vector  and  must  be  in  the  same  order 
as  the  corresponding  entries  in  the  MPS  file  (see  below)  . 

Subroutine  CALCON  is  called  every  time  the  constraints  are  linearized. 
Except  for  Newton's  method  it  is  also  called  one  or  more  times  each  line- 
search  to  allow  computation  of  (3.7)  and  (3.11).  The  expense  of  evaluating 
the  constraints  and  their  gradients  should  therefore  be  taken  into  account 
when  specifying  the  linesearch  accuracy. 

Note  that  every  function  and  Jacobian  element  is  computed  in  every 
call  to  CALCON.  Although  some  of  these  values  may  effectively  be  wasted 
(e.g.  if  some  of  the  constraints  are  a  long  way  from  being  active),  the 

resulting  simplicity  of  the  subroutine  from  the  user's  point  of  view  cannot 
be  overemphasized. 
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Since  the  programming  of  gradients  is  notoriously  prone  to  error, 
the  VERIFY  option  is  an  essential  aid  to  the  user.  This  requests  a  check 
on  the  output  from  CALCFG  and/or  CALCON,  using  finite  differences  of  f^(x) 
or  f_(^0  along  the  coordinate  directions.  The  check  is  performed  at  the 
first  feasible  point  obtained  (where  feasibility  is  with  respect  to  the 
first  linearized  subproblem).  This  point  will  not  satisfy  the  nonlinear 
constraints  in  general,  but  at  least  it  will  satisfy  the  linear  constraints 
and  the  upper  and  lower  bounds  on  x.  Hence  it  is  usually  possible  to 
avoid  singularities  in  the  nonlinear  functions,  both  in  the  gradient  check 
and  in  subsequent  iterations. 


4.5.  Jacobian  Option 

JACOBIAN  =  SPARSE  or  DENSE 

The  submatrices  A^,  A 2>  A^  and  vectors  b^,  b^  in  equation  (4.2) 
are  constant  data  and  so  may  be  entered  using  a  standard  MPS  input  file, 
as  in  linear  programming,  whereby  only  the  nonzero  coefficients  and  their 
row  locations  are  entered  column-by-column .  Since  we  envisage  that  the 
Jacobian  submatrix  J  will  also  be  large  and  sparse  we  use  the  same 
scheme  for  recording  the  row  and  column  locations  of  the  nonzeros.  Thus 
(with  JACOBIAN  =  SPARSE)  the  sparsity  pattern  of  J  is  entered  as  part 
of  the  MPS  file.  The  corresponding  numerical  values  in  the  MPS  file  may 
be  genuine  coefficients  (if  they  are  constant)  or  else  dummy  values, 
such  as  zero.  Each  call  to  subroutine  CALCON  subsequently  replaces  all 
dummy  entries  by  their  actual  value  at  the  current  point  x. 


Of  course  the  intention  here  is  to  allow  use  of  standard  matrix 


generators  to  specify  as  much  of  the  constraint  matrix  as  possible.  Pin¬ 
pointing  the  nonzeros  of  J  by  name  rather  than  number  has  the  usual 
advantages,  and  in  subroutine  CALCON  some  code  of  the  form 

LJAC  =  LJAC  +  1 
G(LJAC)  =  ... 


is  usually  adequate  to  define  the  next  nonzero  in  a  column  of  the  Jacobian, 
without  explicit  reference  to  any  row  or  column  numbers.  Nevertheless, 
the  user  is  effectively  required  to  give  the  sparsity  pattern  twice  (in 
the  MPS  file  and  in  CALCON),  and  it  is  essential  that  mismatches  be 
avoided.  At  present  the  VERIFY  option  is  the  only  aid  to  detecting 
incompatibility. 

In  the  interest  of  simplicity,  the  option  JACOBIAN  =  DENSE  allows 
J  to  be  treated  as  a  dense  matrix.  In  this  case  the  MPS  file  need  not 


specify  any  elements  of  J,  and  sii> routine  CALCON  can  use  assignment 
statements  of  the  form  G(I,J)  =  ...  to  specify  J^  by  row  and  column 
number.  The  danger  of  mismatches  is  thereby  eliminated,  but  the  storage 
requirements  may  be  excessive  for  large  problems.  It  may  also  give  rise 


to  an  unnecessarily  large  "bump"  in  the  basis  factorizations. 


Partial  Completion 


COMPLETION  -  PARTIAL  or  FULL 


"Partial  completion"  is  a  compromise  between  the  two  extremes  of 
relinearizing  after  each  linesearch  and  solving  each  subproblem  to  high 
accuracy  ("full  completion"). 


The  idea  of  attaining  only  partial  completion  at  each  major  iteration 
can  be  accommodated  effectively  via  the  convergence  tolerances.  MINOS  uses 
relatively  loose  tolerances  for  minimizing  the  reduced  objective,  until  it 
appears  that  the  optimal  partition  [B  S  N]  has  been  identified.  The 
partial  completion  option  is  effected  by  terminating  a  major  iteration  at 
this  stage. 

Otherwise  for  full  completion  the  normal  optimization  procedure  is 

continued  using  tighter  tolerances  to  measure  the  change  in  sc,  the  size 

T 

of  the  reduced  gradient  (j|  2  gll/JMI) ,  etc.  This  usually  gives  only  small 
changes  in  x  and  t t_  without  changing  the  partition  [B  S  N). 

An  alternative  means  for  achieving  partial  completion  for  early 
major  iterations  is  via  the  MINOR  ITERATION  limit  (see  Section  4.8) . 


4.7.  Lagrangian  Option,  Penalty  Parameter 

Newton  Strategy:  LAGRANGIAN  NO 

PENALTY  PARAMETER  0.0 

With  this  option  the  constraint  functions  and  gradients  are  evaluated  only 
once  per  major  iteration. 

Augmented  Lagrangian:  LAGRANGIAN  YES 

PENALTY  PARAMETER  p  (p  >_0) 

Here  the  constraints  and  Jacobian  are  evaluated  as  often  as  the  objective. 
Evaluation  of  the  augmented  Lagrangian  and  its  gradient  with  p  >  0  is 
negligibly  more  expensive  than  with  p  -  0  . 
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4.8.  Convergence  Conditions 


MAJOR  ITERATIONS 

60 

MINOR  ITERATIONS 

40 

RADIUS  OF  CONVERGENCE 

e 

c 

(« 

1 

O 

H 

ROW  TOLERANCE 

e 

(~ 

10~6) 

Apart  from  terminating  within  each  major  iteration,  we  also  need  a 
terminating  condition  for  the  cycle  of  major  iterations  (Step  3,  Section  3.5), 
The  point  (x^,^)  is  assumed  to  be  a  solution  to  the  nonlinearly 
constrained  problem  (3.1)-(3.4)  if  both  the  following  conditions  are 
satisfied: 

1.  (x^.y^)  satisfies  the  nonlinear  constraints  (3.2)  to  within  a  pre¬ 

defined  error  tolerance,  i.e. 


I  fi(xk)  +  -  e^bj  _<  er(l  +  IKx^,^)  II) ,  i  *  1, . .  ..m^ 


2.  (x^'X^)  satisfies  the  first-order  Kuhn-Tucker  conditions  for  a 

solution  to  the  linearized  problem. 


The  tolerance  parameter  is  specified  by  the  user,  and  was 

set  equal  to  10  ^  for  most  of  the  test  problems  described  in  the  subsequent 
sections. 

If  the  "partial  completion"  option  is  used,  then  once  these  two 
criteria  are  satisfied,  a  switch  to  "full  completion"  is  invoked  to  obtain 
an  accurate  solution  for  the  current  subproblem,  as  well  as  for  any  possible 


further  linearizations. 


The  error  tolerance  (  is  used  to  define  a  radius  of  convergence 
about  within  which  Robinson's  theorem  is  assumed  to  hold.  If 

the  row  error  defined  above  and  the  relative  change  in  Lagrange  multipliers 
between  successive  subproblems  are  both  less  than  t  in  magnitude  then 
the  penalty  term  is  dropped  (i.e.  .  is  set  to  0.0). 

The  MINOR  ITERATION  limit  is  used  to  terminate  each  linearized  sub- 
problem  when  the  number  of  linesearches  becomes  excessive.  The  limit  of 
AO  was  used  in  all  the  numerical  experiments.  A  much  small  number  would 
result  in  more  frequent  use  of  expensive  housekeeping  operations  such  as 
the  refactorization  of  B  in  Step  3,  Section  3.5.  Similarly  a  much  larger 
number  may  be  wasteful;  if  significant  changes  to  x  have  occurred  then 
a  new  linearization  is  appropriate,  while  if  there  has  been  little  progress 
then  updating  the  Lagrangian  information  gives  some  hope  of  more  rapid 
progress . 

It  must  be  noted  that  for  some  choices  of  x„,  and  p  the 

—0  —0 

sequence  of  majoc  iterations  may  not  converge.  The  MAJOR  ITERATION  limit 
provides  a  safeguard  for  such  circumstances. 

For  a  discussion  of  linearly  constrained  Lagrangian  methods  and 
their  drawbacks  see  Wright  [26],  pp.  137-147. 

In  the  present  implementation  of  MINOS /AUGMENTED  the  only  recourse 
would  be  to  restart  with  a  different  initial  x  or  a  larger  value  for 
the  penalty  parameter  p  (or  both) . 


5.  TEST  PROBLEMS 


Most  of  the  text  examples  reported  here  appear  In  the  published 
literature.  The  last  two  examples  are  new  and  are  described  in  detail. 
They  can  be  made  arbitrarily  large  by  increasing  one  parameter. 

5.1.  Colville  No.  2 

This  well  known  problem  is  one  of  the  more  testing  of  the  Colville 
series  of  problems  [3]  and  has  been  used  frequently  to  compare  different 
algorithms  [1],  [8],  [20],  [23].  It  has  a  cubic  objective  function  and  15 
quadratic  constraints.  Even  in  this  small  problem  the  variables  can  be 
partitioned  into  linear  and  nonlinear  sets,  of  dimension  10  and  5 
respectively. 

a)  Feasible  starting  point  . 

b)  Infeasible  starting  point  . 

5.2.  Colville  No,  3 

This  problem  has  a  quadratic  objective  function  of  five  variables 
and  three  quadratic  constraints.  It  also  has  upper  and  lower  bounds  on 
all  the  variables,  and  upper  and  lower  limits  on  the  constraints.  These 
can  be  accommodated  effectively  by  using  the  BOUNDS  and  RANGES  options 
in  the  MPS  file;  the  BOUNDS  option  allows  variables  to  be  nonbasic  at 
their  upper  or  lower  bound,  and  the  RANGES  option  assigns  both  upper 
and  lower  bounds  to  the  slack  variables  associated  with  the  constraints 
(thus  allowing  the  right-hand  side  to  range  between  bounds) . 


a)  Feasible  starting  point  • 


b)  Infeasible  starting  point  . 

5.3.  Colville  No.  8 

This  is  a  typical  process  design  problem,  where  some  of  the  vari¬ 
ables  are  determined  by  solving  nonlinear  equations.  It  has  3  independent 
variables  and  7  constraints. 


5.4.  Powell's  Problem  [17] 

This  has  5  variables  and  3  constraints.  Although  small,  this  is  a 
good  test  problem  as  the  nonlinearities  in  the  constraints  are  quite 
significant . 


minimize 

|  subject  to 


Starting  point : 


exp(x  x^x  x  ) 

2  2  2  2  2 

x  +  xf  +  x,  +  X7  +  x,  ■  10  > 

1  2  3  4  5 

x^  -  5x^Xj.  =  0  , 

3  3 

x^  +  x2  *  -1  • 

(-2,  2,  2,  -1,  -1). 


5.5.  Power  Scheduling 

This  is  a  comparatively  large  test  problem  reported  recently 
by  Biggs  and  Laughton  [4],  with  79  variables  and  91  constraints  (all  non¬ 
linear).  It  also  has  upper  and  lower  bounds  on  some  of  the  variables. 
Although  all  the  variables  and  constraints  are  nonlinear,  the  linearized 
constraint  matrix  (equation  4.3)  is  quite  sparse  with  on  average  a 

little  under  6  nonzero  row  entries  per  column.  Treating  it  as  a  dense 
matrix  could  result  in  a  "bump"  of  79  columns,  which  is  clearly  undesirable. 
A  number  of  minor  discrepancies  between  Biggs  and  Laughton's  paper  and 
the  original  statement  of  the  problem  [25]  were  resolved  by  using  the 
original  data. 

5.6.  Launch  Vehicle  Design 

This  problem  appears  in  Bracken  and  McCormick's  book  on  nonlinear 
programming  applications  [5]  and  also  appears  in  [22].  There  are  12  linear 
constraints  and  10  nonlinear  constraints,  and  the  Jacobian  of  the  nonlinear 
constraints  is  quite  sparse  (density  23%) , yielding  an  overall  matrix 
density  of  15%.  All  25  variables  are  nonlinear. 

5.7.  Quartic  with  Quartlc  Constraints 

This  problem  appears  in  Pierre  and  Lowe  [15].  Only  a  few  terms 
are  quartic,  the  remainder  being  predominantly  quadratic.  It  has  20 
variables  (all  nonlinear)  and  17  constraints,  13  of  which  are  nonlinear. 


5.8.  Dembo  No.  7 


This  is  one  of  Dembo' s  set  of  8  Geometric  Programming  test 
problems  [28];  in  particular,  it  required  the  most  computation  time  in 
Dembo's  results.  Other  authors  have  reported  difficulty  with  the 
problem  (Powell  [29],  Coope  and  Fletcher  [27]).  There  are  16  variables 
(all  nonlinear)  and  19  general  constraints  (11  of  them  nonlinear). 

The  solution  has  a  few  primal  and  dual  degeneracies,  but  it  is  essentially 
at  a  vertex  of  the  constraint  space  (i.e.,  a  vertex  of  the  final 
linearization) . 

5.9.  Wright  No.  4  [26] 

This  is  a  highly  nonlinear  non-convex  problem  for  which  four  local 
minima  have  been  determined . 

2  2  3  ^4 

minimize  (x^-1)  +  (x^x^  +  (x2~x3)  +  (x^x^)  +  (x^-x57 

2  3 

subject  to  +  x2  +  x^  «  2  +  3^2  , 

O 

x«  -  x  +  x.  *  -2  +  2/2  , 

^34 

X1X5  "  2  ‘ 

Starting  points; 

(a)  (1,1, 1,1,1) 

(b)  (2, 2, 2, 2, 2) 

(c)  (-1,3, -1/2, -2, -3) 

(d)  (-1,2, 1,-2, -2) 

(e)  (-2, -2, -2, -2, -2) 
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Local  optima: 


x*(l)  =  (1.11663,  1.22044,  1.53779,  1.97277,  1.79110)1 
x* (2)  =  (-2.79087,  -3.00414,  .205371,  3.87474,  ~.716623)T 
x* (3)  =  (-1.27305,  2.41035,  1.19486,  -.154239,  -1.57103)1 
x* (4)  =  (-.703393,  2.63570,  -.0963618,  -1.79799,  -2.84336)T 

5.10.  Wright  No.  9  [26] 

This  is  another  highly  nonlinear  example. 

minimize  lOx^x^  -  6x^x^  +  X2X^  +  ^  sin(x^  -  x^)  +  x^x^x^ 

subject  to  x^  +  x^  +  x^  +  x^  +  x^  20  » 

2 

x.x,  +  x.xc  >  -2  , 

1  3  4  5  — 

2 

x.x.  +  10x,x.  >  5 
2  4  15  — 

Starting  points: 

(a)  (1,1, 1,1,1) 

(b)  (1.091,  -3.174,  1.214,  -1.614,  2.134) 

Local  optima: 

x*(l)  -  (-.0814522,  3.69238,  2.48741,  .377134,  .173983)1 
x*(2)  -  (1.47963,  -2.63661,  1.05468,  -1.61151,  2.67388)1 

With  the  barrier  trajectory  algorithm,  Wright  [26]  obtained  convergence 
to  x*(l)  from  (a)  and  convergence  to  x*(2)  from  (b) .  Note  that  starting 
point  (b)  was  originally  chosen  to  cause  difficulty  for  the  barrier 
algorithm  and  other  methods  that  retain  feasibility  throughout. 
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5.11.  Optimal  Control 

This  problem  investigates  the  optimal  control  of  a  spring,  mass 
and  damper  system.  It  is  adapted  from  [19].  While  it  is  acknowledged 

that  there  may  be  simpler  ways  of  solving  the  problem  by  taking 

specific  advantage  of  the  nature  of  the  constraints,  it  serves  the  present 
purpose  of  providing  a  large, sparse  test  problem. 

T 

1  2 

minimize  f(x,y,u)  *  —  ^  x 

1  t=0 

subject  to  xt+i  =  xt  +  0*2yt  I 

yt+1  =  yt  -  O.oiy^  -  0.004xt  +  0.2ut  >  t  =  0,...,  T-l , 

-0.2  <  ut  <  0.2  I 

yt  >  -1.0  ' 

x0  =  10’  y0  =  °*  yT  =  °  ' 

Starting  point:  x^  =  0,  y^  =  -1,  t  =  1,...,  T. 

For  the  results  below  we  set  T  =  100.  There  are  thus  202  nonlinear 
variables  (xt»  yt>  c  “  0,...,  100)  and  100  linear  variables 
(ufc,  t  =  0,...,  99).  There  are  also  100  quadratic  constraints  and  100 
linear  constraints.  Note  that  the  nonlinear  constraints  are  very  sparse, 
with  only  4  nonzeros  per  row. 


The  solution  is  dominated  to  a  large  extent  by  the  lower  bound 

on  v ^ ;  the  optimal  y^  decreases  with  increasing  t  and  remains  at 

-1.0  from  t  =  20  to  t  =  40,  and  then  increases  again,  goes  slightly 

positive  and  settles  to  0.0  at  t  =  100.  The  corresponding  values  of 

x  can  be  calculated  directly  from  the  linear  constraint  xt+j  =  xt  +  0.2yt- 

2 

The  optimal  value  of  the  objective  is  j!xj[  /2  =  1186.382. 


5.12.  Economic  Growth  Model  (Manne  [11]) 

This  is  a  model  for  optimizing  aggregate  consumption  over  time. 

The  variables  are  C^,,  1  and  which  represent  consumption,  investment 

and  capital  in  time  period  t  for  t  *  1,...,  T. 


Utility  function: 


Nonlinear  constraints: 


T 

maximize  T  8  log  C 
t=i  1  c 


a  K  >  C  +  I 
t  t  -  t  t 


1  <  t  <  T 


Linear  constraints: 


Bounds : 


Kt+1  ^  Kt  +  Xt  ’ 


gKT  1  h 


K1  “  +  Ko 


Kt  >  X0  +  K0 


Ct^C0 


1  < 


It  <  (1.04)  IQ 


1  <  t  <  T 


t  <  T 
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Da  t  a : 


s  =  0.95, 


b  =  0.25, 


0 . 0  3  , 


Cq 


xo  -  °-° 


K0  =  3.0, 


.3  =  r.  except 


=  .  ‘  '(It) 


i  - 


( 1+g)  ^  ^  where 


(C()  +  I0)/K0 


The  objective  function  (regarded  as  a  minimand)  and  the  nonlinear  con¬ 
straints  are  both  convex  and  separable.  The  same  is  true  of  the  preceding 
optimal  control  problem.  These  examples  should  therefore  be  useful  test 
problems  for  more  specialized  convex  programming  algorithms. 

For  test  purposes  we  have  used  T  =  100,  which  gives  a  problem 
with  200  general  constraints  and  300  variables.  The  optimal  value  of 
the  utility  function  is  9.287547.  All  general  constraints  are  active 
at  the  solution,  and  the  first  74  upper  bounds  on  I ^  are  also  active. 

It  should  be  mentioned  that  specialized  methods  are  known  to 
economists  for  solving  this  problem,  with  and  without  the  upper  bounds 
("absorptive  capacities")  on  the  variables  1^.  However  in  more  practical 
circumstances  the  model  would  be  imbedded  in  a  much  larger  modi.  !  for 
which  an  analytical  solution  is  not  known.  If  the  larger  model  contains 
no  additional  nonlinearities,  the  performance  of  MINOS/AUGMENTED  should 
degrade  only  in  proportion  to  the  problem  size. 


6.  RESULTS  AND  DISCUSSION 


MINOS /AUGMENTED  is  implemented  in  standard  Fortran.  Various 
options  can  he  selected  at  run-time  by  means  of  a  SPECS  file,  and  -n 
initial  point  can  be  specified  by  an  INITIAL  BOUNDS  set  in  the  file 

containing  the  constraint  data.  The  latter  is  defined  in  the  well-known 
MPS  format  used  by  commercial  mathematical  programming  systems. 

The  following  parameter  values  were  used  throughout: 

LINF.SEARCH  PARAMETER  ETA  =  0.1  (moderately  accurate  search) 

RADIUS  OF  CONVERGENCE  =  0.01  (f  in  Section  3.5) 

c 

ROW  TOLERANCE  =  10"6  (c  in  Section  4.7) 

r 

MINOR  ITERATIONS  LIMIT  =  40  (not  active  on  small  examples) 

Run-times  are  reported  below  in  order  to  allow  comparison  among  various 
algorithmic  options. 

6.1.  Problems  5. 1-5. 8 

These  examples  were  solved  on  a  CDC  Cyber  70.  In  all  cases  con¬ 
vergence  was  obtained  with  zero  penalty  parameter  p.  The  results  are 
summarized  in  Table  6.1.  In  general  the  partial  completion  option  converged 
more  rapidly  than  full  completion.  However  example  5.4  illustrates  that 
fewer  major  iterations  are  likely  if  subprobleras  are  solved  accurately 
once  the  correct  subspace  has  been  identified.  This  was  observed  in 
several  other  cases  and  is  probably  explained  by  the  discussion  of 
in  Section  3.3.  In  terms  of  total  run-time  the  Newton  strategy  was  often 
superior,  but  it  failed  to  converge  on  problems  5.4  and  5.5.  This  deficier 
becomes  more  prominent  in  the  examples  below. 
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Problem  5.8  was  run  with  two  different  minor- Iterat  ion  limits 


(3  and  40,  which  could  have  been  3  and  15  without  altering  the  comparison). 
The  results  illustrate  that  terminating  major  iterations  earlv  can  some¬ 
times  help  rather  than  hinder. 

6.2.  Problems  5.9-5.12 

The  results  for  these  examples  were  obtained  on  an  IBM  370/168 
using  the  Fortran  H  Extended  compiler  with  full  optimization  (OPT  =  3) . 
Computation  was  performed  in  double  precision,  but  the  constraint  data 
were  stored  in  single  precision,  including  in  the  linearization  of 

f^(x) .  This  limits  the  achievable  constraint  error  to  about  10  but 
that  is  usually  adequate  in  practice.  Full  completion  was  used  throughout. 

Problem  5.9  (Wright  No.  4) 

This  highly  nonlinear  problem  illustrates  the  difficulties  discussed 
in  Section  3.4  when  no  penalty  term  is  used.  The  Newton  strategy  and  the 
Lagrangian  algorithm  with  P  =  0  both  gave  rise  to  subproblems  which 
changed  radically  each  major  iteration. 

The  results  using  the  Lagrangian  algorithm  with  p  =  10  and 
p  =  100  are  shown  in  Table  6.2. 

Infeasible  subproblems  were  encountered  with  starting  point  (e) 
using  the  penalty  parameter  p  =  10,  but  the  procedure  discussed  in 
Section  4.6  successfully  overcame  this  difficulty.  Case  (e)  was  also  the 
only  one  for  which  the  larger  p  was  important  in  stabilizing  progress  from 
the  starting  point  to  the  solution. 


Completion 


P.C.  Partial  Completion 
F.C.  Full  Completion 


6.3.  Problem  5.10  (Wright  No.  9) 

Again  the  Newton  strategy  and  the  Lagrangian  algorithm  with  c  *  0 
failed  to  converge.  Results  for  the  Lagrangian  algorithm  with  o  “  10 
and  p  =  100  are  shown  in  Table  6.3. 


Starting  point 

(a) 

(b) 

p 

100 

1 

1 

10 

100  | 

10 

Major  itns 

12 

1 

1 

1 

1 

9 

5  j 

19 

Total  itns 

92 

1 

1 

1 

71 

32  j 

201 

Functions 

183 

1 

1 

1 

146 

61  ! 

386 

Solution 

x*(l) j 

x*(l> 

x* ( 2 ) j 

x*  (3) 

Table  6.3.  Results  for  Test  Problem  5.10 


A  value  of  p  =  10  is  almost  too  low  for  starting  point  (b) ,  the 
subproblem  solutions  changing  radically  as  they  did  for  p  =  0,  but  finally 
converging  to  a  new  local  optimum,  x*(3)  =  (-.07427,  -3.69170,  2.48792, 
0.37693,  0.18446)T. 

In  general  the  results  for  these  last  two  problems  are  inferior  to 
those  obtained  by  Murray  and  Wright  (12],  [26]  with  their  trajectory 
algorithms,  in  terms  of  total  function  evaluations.  However,  the  difference 
is  less  than  a  factor  of  4  in  all  cases,  and  averaged  2.2  over  the  seven 
starting  points. 
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6.4.  Problem  5.11  (Optimal  Control) 

Despite  the  large  size  of  this  problem  both  the  Newton  strategy  and 
tue  Lagrangian  algorithm  with  p  =  0  converged  rapidly. 


Method 

N 

L  (p  =  0) 

Major  itns 

6 

6 

Total  itns 

254 

247 

Functions 

213 

203 

Time  (secs) 

10.55 

11.56 

Table  6.4.  Results  for  Test  Problem  5.11 


Recall  that  procedure  N  evaluates  the  constraint  functions  only  once  per 
major  iteration  (in  this  case  6  times  compared  to  203  times  for  procedure  L) 
If  f.00  were  more  costly  to  compute, the  time  advantage  would  be  that  much 
greater.  The  Lagrangian  algorithm  displayed  insensitivity  to  nonzero 
values  of  p  in  the  range  0.01-10.0,  taking  the  same  iterations  and  cal¬ 
culations  as  shown  in  Table  6.4. 
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6.5.  Problem  5 .12  (Economic  Growth) 


Although  this  is  also  a  convex  problem  the  Newton  strategy  led  to 
oscillation  and  no  convergence.  The  Lagrangian  algorithm  did  converge 
rapidly  with  p  =  0.  Without  the  upper  bounds  1^  £  (1.04)CIg  it  required 
11  major  iterations,  355  minor  iterations,  859  function  calculations  and 
34.3  seconds,  and  there  were  99  superbasic  variables  at  the  optimal  solution. 
However  when  these  bounds  were  imposed,  the  optimal  number  of  superbasics 
was  only  25  and  convergence  was  obtained  in  7  major  iterations,  183  minor 
iterations,  497  function  calculations  and  11.9  seconds.  This  illustrates 
the  gains  that  are  made  when  the  presence  of  constraints  reduces  the  di¬ 
mensionality  of  the  optimal  subspace. 

As  an  experiment  on  the  effect  of  p  on  the  rate  of  convergence  the 
problem  was  solved  several  times  with  different  values  of  p  in  the  range 
10  ^  <  p  £  1.0.  (The  tolerance  ££  for  dropping  p  was  set  to  zero, 
thus  forcing  p  to  stay  constant  for  each  run.)  The  results  are  shown  in 
Figure  1.  This  is  a  plot  of  minor  iterations  versus  log  of  the  constraint 
violation  or  "row  error,"  loglnll  f£x)  +  A  y  -  b  ||  ,  immediately  following 

relinearization.  The  dots  represent  the  end  of  each  major  iteration. 
Initially  these  occur  every  40  iterations  (the  MINOR  ITERATIONS  limit)  but 
later  subproblems  were  solved  accurately  before  the  limit  was  reached.  In 
fact  the  number  of  minor  iterations  reduces  rapidly  to  only  one  or  two  per 
subproblem  as  convergence  is  approached.  This  behavior  was  also  observed 
in  all  of  the  preceding  examples. 

It  can  be  seen  that  higher  values  of  p  give  lower  row  errors  at 
the  end  of  the  first  major  iteration  (as  we  would  expect),  but  they  lead 
to  consistently  slower  convergence.  It  is  interesting  to  note  that  rapid 
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igure  1.  Solution  of  problem  5.12  using  various  values  for  the  penalty 

parameter  p.  The  constraint  violation,  log,_||f(x)  +A,y- b,  ||  , 

Xu - 1  —1  00 

is  plotted  against  minor  iteration  number.  Dots  signify 
the  end  of  each  major  iteration. 
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convergence  does  occur  ultimately  In  all  cases  (except  for  P  =  1.0 
which  was  terminated  after  500  iterations).  However  this  is  not  until  the 
correct  active  constraints  have  been  identified,  by  which  time  x^  is 
very  close  to  the  optimum  and  the  penalty  term  is  having  a  negligible 
effect  on  the  Lagrangian  and  its  reduced  gradient. 

These  results  confirm  that  the  benefit  of  Robinson's  proof  of 
quadratic  convergence  for  the  case  p  =  0  cannot  be  realized  unless  p 
is  actually  reduced  to  zero  as  soon  as  precautions  allow. 

The  dotted  line  in  Figure  1  shows  the  result  for  p  =1.0  (the 

worst  case)  with  set  to  0.01,  allowing  p  to  be  reset  to  zero  at  the 

start  of  major  iteration  3.  (Note  that  on  the  highly  nonlinear  examples  5.9 

and  5.10,  the  same  value  of  e  ensured  that  p  was  not  set  to  zero  until 

c 

a  near  optimum  point  was  reached.  This  was  the  desired  effect  since  the 
multiplier  estimates  were  changing  radically  in  the  early  major 

iterations.) 

It  will  be  seen  in  Figure  1  that  the  row  error  increases  sharply 
once  p  is  reset  to  zero.  This  is  consistent  with  the  algorithm  suddenly 
being  free  to  take  a  large  step.  We  could  therefore  regard  the  first  two 
major  iterations  as  having  served  to  verify  that  the  problem  is  only 
mildly  nonlinear. 


7.  CONCLUSIONS 

Many  real-life  optimization  problems  originate  as  linear  programming 
models  that  are  not  quite  linear;  i.e.  they  contain  simple,  differentiable 
functions  in  the  constraints  and  objective,  but  otherwise  the  constraint  set 


is  large,  sparse  and  linear.  For  such  problems  the  Jacobian  matrix  is  ais' 


likely  to  be  sparse,  and  the  strategy  of  solving  a  sequence  of  linearly  con¬ 
strained  subprobleras  has  many  advantages.  This  is  clear  from  the  results 
obtained  for  the  larger  test  problems  above. 

For  convex  problems  the  Lagrangian  term  in  the  objective  of  the  sub¬ 
problems  is  usually  necessary  to  ensure  convergence.  The  Newton  strategy 
performed  adequately  without  it  on  some  occasions,  but  in  general  the  saving 
in  run  time  will  seldom  be  substantial. 

For  non-convex  problems,  both  the  Lagrangian  and  the  penalty  term 
are  clearly  useful  but  the  actual  choice  of  the  penalty  parameter  g 
remains  a  critical  decision  for  the  user.  In  practice,  optimization  problems 
are  often  solved  repeatedly  on  a  production  basis.  In  this  situation  it  is 
worthwhile  experimenting  with  different  values  of  the  parameters  and  toler¬ 
ances  (and  perhaps  with  the  Newton  strategy) .  However  the  case  of  an  isolated 
problem  with  unknown  characteristics  is  no  less  important.  A  conservative 
(high)  value  of  p  is  then  virtually  mandatory.  One  of  our  aims  has  been 
to  minimize  the  risk  of  subsequent  slow  convergence  by  determining  an 
opportune  time  to  reduce  p  to  zero.  The  analysis  in  Section  3.4  has 
suggested  a  practical  procedure  for  achieving  this  aim.  In  particular, 
the  "radius  of  convergence"  tolerance  ec  (applied  to  both  the  constraint 
violation  and  the  relative  change  in  the  estimates  of  A_)  allows  early 
removal  of  p  in  moderately  nonlinear  cases  but  otherwise  retains  it  until 
convergence  to  a  local  solution  is  imminent. 

The  results  reported  here  should  provide  a  benchmark  for  measuring 
the  performance  of  other  algorithms  and  their  implementations.  Clearly  no 
single  algorithm  can  be  expected  to  perform  uniformly  better  than  all  others 
in  such  a  diverse  field  as  nonlinear ly  constrained  optimization.  As  it 
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happens,  MINOS/AUGMENTED  has  proved  to  be  reasonably  efficient  on  small, 
highly  nonlinear  problems,  but  more  importantly,  it  represents  an 
advance  in  the  development  of  general-purpose  software  for  large-scale 
optimization. 
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